AITopics | readability assessment

Collaborating Authors

readability assessment

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Hierarchical Ranking Neural Network for Long Document Readability Assessment

Zheng, Yurui, Chen, Yijun, Zhang, Shaohong

arXiv.org Artificial IntelligenceNov-27-2025

Readability assessment aims to evaluate the reading di ffi culty of a text. In recent years, while deep learning technol - ogy has been gradually applied to readability assessment, m ost approaches fail to consider either the length of the text or the ordinal relationship of readability labels. This pap er proposes a bidirectional readability assessment mechan ism that captures contextual information to identify regions w ith rich semantic information in the text, thereby predicti ng the readability level of individual sentences. These sente nce-level labels are then used to assist in predicting the ov erall readability level of the document. Additionally, a pairwis e sorting algorithm is introduced to model the ordinal relationship between readability levels through label subtrac tion. Experimental results on Chinese and English datasets demonstrate that the proposed model achieves competitive p erformance and outperforms other baseline models. Introduction Automatic Text Readability (ARA) research originated in th e early 20th century, aiming to evaluate text reading di ffi culty and assist educators in recommending appropriate rea ding materials for learners [ 1 ]. Readability assessment approaches are generally classified into three paradig ms: human evaluation, co-selection-based analysis, and content-based analysis. Human evaluation involves expert annotation or reader surveys; co-selection methods leverage user interaction data such as reading time or choices [ 2 ]; and content-based approaches infer readability using linguistic, syntactic, or semantic features extracted fro m the text itself. Early studies predominantly relied on experts' subjective evaluations and simple statistical feat ures, such as sentence length and word complexity.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.21473

Country:

Europe (0.93)
Asia > China (0.46)
North America > United States (0.28)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.68)
Education > Educational Setting > K-12 Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Readability Reconsidered: A Cross-Dataset Analysis of Reference-Free Metrics

Belem, Catarina G, Glenn, Parker, Samuel, Alfy, Kumar, Anoop, Liu, Daben

arXiv.org Artificial IntelligenceOct-20-2025

Automatic readability assessment plays a key role in ensuring effective and accessible written communication. Despite significant progress, the field is hindered by inconsistent definitions of readability and measurements that rely on surface-level text properties. In this work, we investigate the factors shaping human perceptions of readability through the analysis of 897 judgments, finding that, beyond surface-level cues, information content and topic strongly shape text comprehensibility. Furthermore, we evaluate 15 popular readability metrics across five English datasets, contrasting them with six more nuanced, model-based metrics. Our results show that four model-based metrics consistently place among the top four in rank correlations with human judgments, while the best performing traditional metric achieves an average rank of 8.6. These findings highlight a mismatch between current readability metrics and human perceptions, pointing to model-based approaches as a more promising direction.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.15345

Country:

Asia (1.00)
North America > United States > California (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Curriculum > Subject-Specific Education (1.00)
Education > Educational Setting > K-12 Education (0.95)
Health & Medicine > Consumer Health (0.93)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)

Add feedback

Lexicon-Enriched Graph Modeling for Arabic Document Readability Prediction

Elchafei, Passant, Osama, Mayar, Rageh, Mohamed, Abuelkheir, Mervat

arXiv.org Artificial IntelligenceSep-30-2025

We present a graph-based approach enriched with lexicons to predict document-level readability in Arabic, developed as part of the Constrained Track of the BAREC Shared Task 2025. Our system models each document as a sentence-level graph, where nodes represent sentences and lemmas, and edges capture linguistic relationships such as lexical co-occurrence and class membership. Sentence nodes are enriched with features from the SAMER lexicon as well as contextual embeddings from the Arabic transformer model. The graph neural network (GNN) and transformer sentence encoder are trained as two independent branches, and their predictions are combined via late fusion at inference. For document-level prediction, sentence-level outputs are aggregated using max pooling to reflect the most difficult sentence. Experimental results show that this hybrid method outperforms standalone GNN or transformer branches across multiple readability metrics. Overall, the findings highlight that fusion offers advantages at the document level, but the GNN-only approach remains stronger for precise prediction of sentence-level readability.

machine learning, natural language, prediction, (17 more...)

arXiv.org Artificial Intelligence

2509.2287

Country:

Europe > Austria (0.28)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

mucAI at BAREC Shared Task 2025: Towards Uncertainty Aware Arabic Readability Assessment

Abdou, Ahmed

arXiv.org Artificial IntelligenceSep-22-2025

We present a simple, model-agnostic post-processing technique for fine-grained Arabic readability classification in the BAREC 2025 Shared Task (19 ordinal levels). Our method applies conformal prediction to generate prediction sets with coverage guarantees, then computes weighted averages using softmax-renormalized probabilities over the conformal sets. This uncertainty-aware decoding improves Quadratic Weighted Kappa (QWK) by reducing high-penalty misclassifications to nearer levels. Our approach shows consistent QWK improvements of 1-3 points across different base models. In the strict track, our submission achieves QWK scores of 84.9\%(test) and 85.7\% (blind test) for sentence level, and 73.3\% for document level. For Arabic educational assessment, this enables human reviewers to focus on a handful of plausible levels, combining statistical guarantees with practical usability.

machine learning, natural language, prediction, (16 more...)

arXiv.org Artificial Intelligence

2509.15485

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report (0.64)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

!MSA at BAREC Shared Task 2025: Ensembling Arabic Transformers for Readability Assessment

Basem, Mohamed, Younes, Mohamed, Ahmed, Seif, Moustafa, Abdelrahman

arXiv.org Artificial IntelligenceSep-15-2025

We present MSAs winning system for the BAREC 2025 Shared Task on fine-grained Arabic readability assessment, achieving first place in six of six tracks. Our approach is a confidence-weighted ensemble of four complementary transformer models (AraBERTv2, AraELECTRA, MARBERT, and CAMeLBERT) each fine-tuned with distinct loss functions to capture diverse readability signals. To tackle severe class imbalance and data scarcity, we applied weighted training, advanced preprocessing, SAMER corpus relabeling with our strongest model, and synthetic data generation via Gemini 2.5 Flash, adding about 10,000 rare-level samples. A targeted post-processing step corrected prediction distribution skew, delivering a 6.3 percent Quadratic Weighted Kappa (QWK) gain. Our system reached 87.5 percent QWK at the sentence level and 87.4 percent at the document level, demonstrating the power of model and loss diversity, confidence-informed fusion, and intelligent augmentation for robust Arabic readability prediction.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.1004

Country:

Asia (0.68)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

A Large and Balanced Corpus for Fine-grained Arabic Readability Assessment

Elmadani, Khalid N., Habash, Nizar, Taha-Thomure, Hanada

arXiv.org Artificial IntelligenceFeb-19-2025

This paper introduces the Balanced Arabic Readability Evaluation Corpus BAREC, a large-scale, fine-grained dataset for Arabic readability assessment. BAREC consists of 68,182 sentences spanning 1+ million words, carefully curated to cover 19 readability levels, from kindergarten to postgraduate comprehension. The corpus balances genre diversity, topical coverage, and target audiences, offering a comprehensive resource for evaluating Arabic text complexity. The corpus was fully manually annotated by a large team of annotators. The average pairwise inter-annotator agreement, measured by Quadratic Weighted Kappa, is 81.3%, reflecting a high level of substantial agreement. Beyond presenting the corpus, we benchmark automatic readability assessment across different granularity levels, comparing a range of techniques. Our results highlight the challenges and opportunities in Arabic readability modeling, demonstrating competitive performance across various methods. To support research and education, we will make BAREC openly available, along with detailed annotation guidelines and benchmark results.

artificial intelligence, machine learning, natural language, (12 more...)

arXiv.org Artificial Intelligence

2502.1352

Country:

Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)
Europe > Slovenia (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(18 more...)

Genre: Research Report > New Finding (0.88)

Industry: Education > Educational Setting > K-12 Education > Primary School (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

A study of Vietnamese readability assessing through semantic and statistical features

Le, Hung Tuan, To, Long Truong, Nguyen, Manh Trong, Nguyen, Quyen, Do, Trong-Hop

arXiv.org Artificial IntelligenceNov-7-2024

Determining the difficulty of a text involves assessing various textual features that may impact the reader's text comprehension, yet current research in Vietnamese has only focused on statistical features. This paper introduces a new approach that integrates statistical and semantic approaches to assessing text readability. Our research utilized three distinct datasets: the Vietnamese Text Readability Dataset (ViRead), OneStopEnglish, and RACE, with the latter two translated into Vietnamese. Advanced semantic analysis methods were employed for the semantic aspect using state-of-the-art language models such as PhoBERT, ViDeBERTa, and ViBERT. In addition, statistical methods were incorporated to extract syntactic and lexical features of the text. We conducted experiments using various machine learning models, including Support Vector Machine (SVM), Random Forest, and Extra Trees and evaluated their performance using accuracy and F1 score metrics. Our results indicate that a joint approach that combines semantic and statistical features significantly enhances the accuracy of readability classification compared to using each method in isolation. The current study emphasizes the importance of considering both statistical and semantic aspects for a more accurate assessment of text difficulty in Vietnamese. This contribution to the field provides insights into the adaptability of advanced language models in the context of Vietnamese text readability. It lays the groundwork for future research in this area.

dataset, language model, readability, (16 more...)

arXiv.org Artificial Intelligence

2411.04756

Country:

Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(2 more...)

Genre: Research Report > New Finding (0.66)

Industry: Education > Educational Setting > K-12 Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Strategies for Arabic Readability Modeling

Liberato, Juan Piñeros, Alhafni, Bashar, Khalil, Muhamed Al, Habash, Nizar

arXiv.org Artificial IntelligenceJul-3-2024

Automatic readability assessment is relevant to building NLP applications for education, content analysis, and accessibility. However, Arabic readability assessment is a challenging task due to Arabic's morphological richness and limited readability resources. In this paper, we present a set of experimental results on Arabic readability assessment using a diverse range of approaches, from rule-based methods to Arabic pretrained language models. We report our results on a newly created corpus at different textual granularity levels (words and sentence fragments). Our results show that combining different techniques yields the best results, achieving an overall macro F1 score of 86.7 at the word level and 87.9 at the fragment level on a blind test set. We make our code, data, and pretrained models publicly available.

computational linguistic, proceedings, readability assessment, (12 more...)

arXiv.org Artificial Intelligence

2407.03032

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(23 more...)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

An Open Multilingual System for Scoring Readability of Wikipedia

Trokhymovych, Mykola, Sen, Indira, Gerlach, Martin

arXiv.org Artificial IntelligenceJun-3-2024

With over 60M articles, Wikipedia has become the largest platform for open and freely accessible knowledge. While it has more than 15B monthly visits, its content is believed to be inaccessible to many readers due to the lack of readability of its text. However, previous investigations of the readability of Wikipedia have been restricted to English only, and there are currently no systems supporting the automatic readability assessment of the 300+ languages in Wikipedia. To bridge this gap, we develop a multilingual model to score the readability of Wikipedia articles. To train and evaluate this model, we create a novel multilingual dataset spanning 14 languages, by matching articles from Wikipedia to simplified Wikipedia and online children encyclopedias. We show that our model performs well in a zero-shot scenario, yielding a ranking accuracy of more than 80% across 14 languages and improving upon previous benchmarks. These results demonstrate the applicability of the model at scale for languages in which there is no ground-truth data available for model fine-tuning. Furthermore, we provide the first overview on the state of readability in Wikipedia beyond English.

dataset, readability, wikipedia, (13 more...)

arXiv.org Artificial Intelligence

2406.01835

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
(14 more...)

Genre: Research Report > New Finding (0.88)

Industry:

Health & Medicine (0.67)
Education (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

InterEvo-TR: Interactive Evolutionary Test Generation With Readability Assessment

Delgado-Pérez, Pedro, Ramírez, Aurora, Valle-Gómez, Kevin J., Medina-Bulo, Inmaculada, Romero, José Raúl

arXiv.org Artificial IntelligenceJan-13-2024

Automated test case generation has proven to be useful to reduce the usually high expenses of software testing. However, several studies have also noted the skepticism of testers regarding the comprehension of generated test suites when compared to manually designed ones. This fact suggests that involving testers in the test generation process could be helpful to increase their acceptance of automatically-produced test suites. In this paper, we propose incorporating interactive readability assessments made by a tester into EvoSuite, a widely-known evolutionary test generation tool. Our approach, InterEvo-TR, interacts with the tester at different moments during the search and shows different test cases covering the same coverage target for their subjective evaluation. The design of such an interactive approach involves a schedule of interaction, a method to diversify the selected targets, a plan to save and handle the readability values, and some mechanisms to customize the level of engagement in the revision, among other aspects. To analyze the potential and practicability of our proposal, we conduct a controlled experiment in which 39 participants, including academics, professional developers, and student collaborators, interact with InterEvo-TR. Our results show that the strategy to select and present intermediate results is effective for the purpose of readability assessment. Furthermore, the participants' actions and responses to a questionnaire allowed us to analyze the aspects influencing test code readability and the benefits and limitations of an interactive approach in the context of test case generation, paving the way for future developments based on interactivity.

interaction, participant, test case, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TSE.2022.3227418

2401.07072

Country:

Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Spain > Andalusia > Córdoba Province > Córdoba (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Software Engineering (0.69)

Add feedback